Basis Function Adaptation in Temporal Difference Reinforcement Learning
نویسندگان
چکیده
We examine methods for on-line optimization of the basis function for temporal difference Reinforcement Learning algorithms. We concentrate on architectures with a linear parameterization of the value function. Our methods optimize the weights of the network while simultaneously adapting the parameters of the basis functions in order to decrease the Bellman approximation error. A gradient-based method and the Cross Entropy method are applied to the basis function adaptation problem. The performance of the proposed algorithms is evaluated and compared using simulation experiments. keywords: Reinforcement Learning, Temporal difference algorithm, Cross Entropy method, Radial Basis functions.
منابع مشابه
Temporal Difference Learning in Continuous Time and Space
A continuous-time, continuous-state version of the temporal difference (TD) algorithm is derived in order to facilitate the application of reinforcement learning to real-world control tasks and neurobiological modeling. An optimal nonlinear feedback control law was also derived using the derivatives of the value function. The performance of the algorithms was tested in a task of swinging up a p...
متن کاملAutomatic speech recognition based on adaptation and clustering using temporal-difference learning
This paper describes a novel approach based on online unsupervised adaptation and clustering using temporal-difference (TD) learning. Temporal-difference learning is a reinforcement learning technique and is a computational approach to learning whereby an agent tries to maximize the total amount of reward it receives when interacting with a complex, uncertain environment. The adaptation progres...
متن کاملConvergence of Reinforcement Learning with General Function Approximators
A key open problem in reinforcement learning is to assure convergence when using a compact hypothesis class to approximate the value function. Although the standard temporal-difference learning algorithm has been shown to converge when the hypothesis class is a linear combination of fixed basis functions, it may diverge with a general (nonlinear) hypothesis class. This paper describes the Bridg...
متن کاملModel-based reinforcement learning using on-line clustering
A significant issue in representing reinforcement learning agents in Markov decision processes is how to design efficient feature spaces in order to estimate optimal policy. The particular study addresses this challenge by proposing a compact framework that employs an on-line clustering approach for building appropriate basis functions. Also, it performs a stateaction trajectory analysis to gai...
متن کاملTransfer Learning via Inter-Task Mappings for Temporal Difference Learning
Temporal difference (TD) learning (Sutton and Barto, 1998) has become a popular reinforcement learning technique in recent years. TD methods, relying on function approximators to generalize learning to novel situations, have had some experimental successes and have been shown to exhibit some desirable properties in theory, but the most basic algorithms have often been found slow in practice. Th...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Annals OR
دوره 134 شماره
صفحات -
تاریخ انتشار 2005